Goto

Collaborating Authors

 negative log


Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks

Kar, Indrajit, Kumar, Kalathur Chenchu Kishore

arXiv.org Artificial Intelligence

Large Language Models and multi-agent systems have shown promise in decomposing complex tasks, yet they struggle with long-horizon reasoning tasks and escalating computation cost. This work introduces a hierarchical multi-agent architecture that distributes reasoning across a 64*64 grid of lightweight agents, supported by a selective oracle. A spatial curriculum progressively expands the operational region of the grid, ensuring that agents master easier central tasks before tackling harder peripheral ones. To improve reliability, the system integrates Negative Log-Likelihood as a measure of confidence, allowing the curriculum to prioritize regions where agents are both accurate and well calibrated. A Thompson Sampling curriculum manager adaptively chooses training zones based on competence and NLL-driven reward signals. We evaluate the approach on a spatially grounded Tower of Hanoi benchmark, which mirrors the long-horizon structure of many robotic manipulation and planning tasks. Results demonstrate improved stability, reduced oracle usage, and stronger long-range reasoning from distributed agent cooperation.



softmax-classifiers-explained

#artificialintelligence

Last week, we discussed Multi-class SVM loss; specifically, the hinge loss and squared hinge loss functions. In reality, these values would not be randomly generated -- they would instead be the output of your scoring function f. Let's exponentiate the output of the scoring function, yielding our unnormalized probabilities: Figure 2: Exponentiating the output values from the scoring function gives us our unnormalized probabilities. Figure 4: Taking the negative log of the probability for the correct ground-truth class yields the final loss for the data point. To examine some actual probabilities, let's loop over a few randomly sampled training examples and examine the output probabilities returned by the classifier: Note: I'm randomly sampling from the training data rather than the testing data to demonstrate that there should be a noticeably large gap in between the probabilities for each class label.